Skip to content

Conversation

@jpculp
Copy link

@jpculp jpculp commented Oct 17, 2025

Issue #, if available:

Description of changes:

Adds a simple workload test case that runs a command or script within a test container.

Sample output from running a multi-test with a Python-based "Hello world" type of container and a BASH-based NVIDIA smoke test container:

2025/11/21 20:48:37 Starting workload test suite...
=== RUN   TestWorkload
=== RUN   TestWorkload/hello-python
    workload_test.go:107: Creating hello-python job
    workload_test.go:112: hello-python job created successfully
=== RUN   TestWorkload/hello-python/Job_succeeds
    workload_test.go:119: Waiting for hello-python job to complete
=== NAME  TestWorkload/hello-python
    workload_test.go:136: Test log for hello-python:
    workload_test.go:137: hello, python

--- PASS: TestWorkload (10.64s)
    --- PASS: TestWorkload/hello-python (10.64s)
        --- PASS: TestWorkload/hello-python/Job_succeeds (10.01s)
PASS

...

2025/11/21 20:48:47 Starting workload test suite...
=== RUN   TestWorkload
=== RUN   TestWorkload/nvidia-smoke
    workload_test.go:105: Creating nvidia-smoke job with resources: map[nvidia.com/gpu:1]
    workload_test.go:112: nvidia-smoke job created successfully
=== RUN   TestWorkload/nvidia-smoke/Job_succeeds
    workload_test.go:119: Waiting for nvidia-smoke job to complete
=== NAME  TestWorkload/nvidia-smoke
    workload_test.go:136: Test log for nvidia-smoke:
    workload_test.go:137:

...

--- PASS: TestWorkload (30.57s)
    --- PASS: TestWorkload/nvidia-smoke (30.57s)
        --- PASS: TestWorkload/nvidia-smoke/Job_succeeds (30.01s)
PASS

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just want to nit the naming, "workload" is pretty broad, could we pick a clear name with a theme like "jobTemplate"? I'm really open to suggestions but i want to make sure this is sane/immediately readable

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I chose "workload" since I came from using tools like Sonobuoy. I'm not sure about "jobTemplate", but I'm open to other suggestions if folks really don't like referring to these types of jobs as workloads. I may have to close this PR and open a new one to change the branch name though.

@jpculp
Copy link
Author

jpculp commented Nov 20, 2025

Finally coming back to this, but starting off with a simple rebase.

@jpculp jpculp marked this pull request as draft November 20, 2025 21:33
@jpculp
Copy link
Author

jpculp commented Nov 21, 2025

Addressed feedback from @ndbaker1 while adding support for neuron and nvidia accelerated jobs.

@jpculp jpculp requested review from mselim00 and yeazelm November 21, 2025 04:05
@jpculp jpculp marked this pull request as ready for review November 21, 2025 04:05
Copy link
Contributor

@mselim00 mselim00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice feature, some minor comments

Comment on lines 26 to 29
workloadTestCommand = flag.String("workloadTestCommand", "", "command for workload test")
workloadTestImage = flag.String("workloadTestImage", "", "image for workload test")
workloadTestName = flag.String("workloadTestName", "workload-test", "name for workload test")
workloadTestAccelerator = flag.String("workloadTestAccelerator", "", "accelerator for workload test: neuron, nvidia")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is the workloadTest prefix redundant here? the caller would already be executing a workload.test binary

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It kind of is, but I was just following the style of the existing neuron and nvidia tests that had neuronTestImage and nvidiaTestImage.

Comment on lines 75 to 77
if *workloadTestName == "" {
t.Fatal("workloadTestName must be set to run the test")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idt this needs to be required, it wouldn't really impact functionality and there is a default right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ends up being the name of the feature. While it's true we set a default, for these kinds of things I prefer a belt and suspenders type of approach.

@jpculp
Copy link
Author

jpculp commented Nov 21, 2025

Addressed feedback from @mselim00, such as:

  • Replace workloadTestAccelerator with workloadTestResources for more flexible resource definitions.
  • Add workloadTestTimeout (with a default of 10m).
  • Allow workloadTestCommand to be optional.
  • Other small optimizations.

@jpculp jpculp requested a review from mselim00 November 21, 2025 20:56
Signed-off-by: Patrick J.P. Culp <jpculp@amazon.com>
@jpculp
Copy link
Author

jpculp commented Nov 21, 2025

Addressed some additional feedback:

  • Prune resources that are 0.
  • Take advantage of Duration instead of forcing minutes.

@jpculp jpculp requested a review from mselim00 November 21, 2025 23:17
Copy link
Contributor

@mselim00 mselim00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants